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ABSTRACT 



A neural net architecture provides for the recognition 
of an input signal which is a rate variant of a learned 
signal pattern, reducing the neural net training require- 
ments. The duration of a digital sampling of the input 
signal is scaled by a time-scaling network, creating a 
multiplicity of scaled signals which are then compared 
to memorized signal patterns contained in a self-organ- 
izing feature map. The feature map outputs values 
which indicate how well the scaled input signals match 
various learned signal patterns. A comparator deter- 
mines which one of the values is greatest, thus indicat- 
ing a best match between the input signal and one of the 
learned signal patterns. 

7 Qaims, 2 Drawing Sheets 
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. comparator determines which one of the vahies is great- 

NEURAL NET ARCHITECnmE FOR est, thus indicating a best match between the input sig. 

RATE- VARYING INPUTS nal and one of the learned signal patterns. 



BACKGROUND OF THE INVENTION 5 

The present invention relates, in general, to neural net 
architecture, and more particularly, to a neural net 
architecture which.minimizes training ttme by enabling 
a feature map to recognize input signals that are rate 
variants of a previously learned. signal pattern. 

Advancements in^neural net architecture have made 
neural nets the technology of choice for such advanced 
artiBcia] intelligence applications as speech recognition 
and real-time handwriting recognition. Such advanced 
functions as speaker verification and signature verifica* 
tion may potentially be implemented using neural nets. 
There are, however, many problems to be overcome in 
these areas, not the least of which is the rate at which a 
speaker speaks, or a writer writes. 

In the past, speech and handwriting recognition were 20 
^)proached in a number of ways. Dynamic Program- 
ming, described by Silverman, H. R, and Morgan, D.P., 
*The Application of Dynamic Programming to Con- 
nected Speech Recognition", IEEE ASSP Magazine, 
July 1990, pp 6-25, was a statistical ^proach which. 25 
rdied upon forward search with back-tracking to deter- 
mine the probability that a given input corresponded to 
a certain pattern. Dynamic Programming was further- 
refmed using Hidden Markov Models as described by 
Picone; Joseph, "Continuous Speech Recognition 30 
Using Hidden Markov Models". IEEE ASSP Maga- 
zine, July 1990, pp 16-41. These were software imple- 
mentations which required a long time to train to recog- 
nize .varied inputs. Another approach was described by 
Tank, D. W.i and Hopfield, "Concentrating Infonna- 35 
tion in Tune: Analog Neural Networks with Applica- 
tions to Speech Recognition Problems", Procedures of 
the IEEE Conference on Neural Networks, San Diego, 
Jun. 21-24, 1987. pp IV455-IV468. Though this work 
demonstrated the applicability of neural nets to speech 40 
recognition, a practical application of the approach 
required a vast commitment of hardware. The pre- 
wired analog . nets could only recognize the exact pat- 
tern for which they were wired In order to overcome 
this shortcoming, additional circuitry for every possible 45 
variation of each input had to be added. 

Advances, in digital implementations of neural net- 
works used less hardware than required by the Tank 
and Hopfield approach. However, the need for a feature 
map with a inemorized pattern for each potential input 50 
l i m i te d the ability of the system to recognize variants in 
speaking rates. It was necessary to train the system to 
recognize each new input rate as it was encountered. 
This became very time consuming. Also, the feature 
map, and thus the memory requirements of the system, 55 
quickly multiplied to unwieldy proportions. 

SUMMARY OF THE INVENTION 

The objects and advantages of the present invention 
are provided by a neural net arc^tecture which pro- 60 
vidcs for the recognition of an input signal which is a 
rate variant of a learned signal pattern. The duration of 
a digital sampling of the input signal is scaled by a time- 
scaling network, creating a multiplicity of scaled signals 
which are then compared to memorized signal patterns 65 
contained in a self-organizing feature map. The feature 
map outputs values which indicate how well the scaled 
input signals match various learned , signal patterns. A 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is block diagram illustrating an embodiment of 
the present invention; 

FIG. 2 is a block diagram of a time^scale circuit 
which may be used as a part of the present invention; 

FIG. 3 is a simplified illustration of a feature map 
with a delay line which may be used as a pan of the 
present invention; and 

FIG. 4 is a block .diagram of an alternate embodiment 
of the present invention. 

DETAILED DESCRIPTION OF THE 
DRAWINGS 

FIG. 1 is a block diagram of the present invention as 
applied to speech recognition. Analog input signal 11 is 
sampled by analog-tOrdigital (A/D) converter 12, creat- 
ing a digital image of input 11, represented in FIG. 1 by 
signal 13. Signal 13 as output by A/D converter 12 is a 
multi-bit signal, typically of eight to sixteen biu. In the 
case of speech recognition, signal 13 is passed through 
fast Fourier transform circuit 14 to transform signal 13 
from the time domain to the frequency domain as repre- 
sented by signal 16. The output of fast Fourier trans- 
form circuit 14 is a multi-channel, multi-bit signal, with 
each channel representing a range of frequencies found 
in input signal 11. Signal 16, then represents a family of 
outputs which describe the frequency characteristics of 
input signal 11. 

Time-scale section 17 expands and compresses the 
duration of signal 16 by a set of ratios. Multiple signals 
18, 18' and 18", each representing signal 16 lasting for a 
differently scaled duration, are passed to feature map 
19. The object is that input sigcuol 11 may represent a 
certain phoneme, such as "a**. Netiral net feature map 19 
has been trained to recognize learned pattern 21 as 
phoneme **e** and learned pattern 22 as phoneme "a". 
Learned pattern 22 is similar to signal 16, except that 
signal 16 resulted from phoneme **a" being spoken more 
slowly than when the neural net was trained, establish- 
ing learned pattern 22 as a recognizable **a". Signal 16 
will thus not be recognized as ah "a". Instead of present- 
ing signal 16 to feature map 19, time-scaled signals 18, 
18' and 18" are presented sequentially. The ratios shown 
are examples only, and are not to be construed as con- 
stramts as to the ratios attainable. Ideally, the more 
differently ratioed signals 18, 18' and 18-' that are output 
by time-scale section 17, the greater the opportunity for 
feature map 19 to recognize a signal. Ratios of 2:1 and 
0.5:1, however, do present practical upper and lower 
limits to the amotmt of scaling that can be realized with- 
out loss of fidelity to signal 16. 

As each signal 18, 18' and 18" is presented to feature 
map 19, feature map 19 outputs a value which repre- 
sents how well each signal 18, 18' and 18" matches each 
learned pattern. Since none of the signals 18, 18' or 18" 
match learned pattern 21 well, output 23 will be rela- 
tively low. On the other handi, there is a good match 
between signal 18" and learned pattern 22. Thus output 
24 will be relatively high. Comparator 26 examines the 
relative values of outputs 23 and 24, recognizes output 
24 as the highest, and indicates this fact with output 27, 
establishing input 11 as being recognized as the pho- 
neme **a". 
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FIG. 2 illustrates the operation of tixne-scale section 
17. The duration of digital input signal 16 of FIG. 1 is 
expanded by a factor "H*' by artificially adding data 
samples by means of interpolator 31. The output of 
interpolator 31 is smoothed by low*pass filter 32. The 5 
duration of the output of low-pass filter 32 is com- 
pressed by a factor "L" by removing data samples by . 
means of decimator 33. The net scaling is then H/L. 
The operation of time-scaling circuit 17 is described in 
detail by Crochiere, R. E and Rabiner, L. R., Multirate 10 
Digital Signal Processing, Prentice-Hall, New Jersey, 
1983, pp 39-42, and by Rabincr, L. R. and Schafer. R. 
W., Digital Processing of Speech Signals, Prentice-Hall, 
New Jersey, 1978, pp 27-30, which descriptions are 
hereby incorporated herein by reference. 13 

FIG. 3 illustrates the use of a delay line in coiyunc- 
tion-with the feature map to perform pattern recogni- 
tion. Signal 18" is clocked into multistage delay line 36. 
Signal 18", now held in delay line 36, is then compared ' 
to learned patterns in feature map segment 37. Note that 20 
one bit of learned pattern 21 matches signal 18'. Output 
23 reflects this fact. Learned pattern 38 matches two 
bits of signal 18", and output 41 is appropriately 
weighted. Learned pattern 22 is the closest match, with 
output 24 reflecting a four-bit match. Finally output 42 25 
indicates the lack of any matching bits between learned 
pattern 39 and signal 18". Recall that signal 18" is a 
multi-channel, multi-bit signal. The binary representa- 
tion used herein is not to be construed as a limitation, 
but is used as an illustrative e^iample only. 30 

FIG, 4 illustrates an alternate embodiment of the 
present invention, highlighting two specific features. 
The first is that fast Fourier transform circuit 14 is elimi- 
nated. In an application such as real-time handwriting 
recognition, frequency variations are not a factor as in 35 
speech recognition, and the transformation from the 
time domain to the frequency domain is not necessarily 
appropriate. The second feature illustrated by FIG. 4 is 
a trade off between hardware and speed. Separate iden- 
tical feature maps 19, 19" and 19" are utilized to look at 40 
each of the outputs of time-scale section 17. Thus the 
comparisons of each one of signals 18 to the learned 
patterns of the feature map are accomplished in parallel, 
greatly enhancing the speed of the neural net. The out- 
puts of feature maps 19, 19' and 19" are compared by 45 
comparators 26, 26' and 26", respectively. A final deci- 
sion as to the best match is then made by comparator 28. 

By now it should be apparent that an improved neural 
net architecture has been provided which provides for 
the recognition of an input signal which is a rate variant 50 
of a learned signal pattern. The duration of a digital 
sampling of the input signal is scaled by a time-scaling 
network, creating a multiplicity of scaled signals which 
are then compared to memorized signal patterns con- 
tained in a self-organizing feature map. The feature map 55 
outputs values which indicate bow well the scaled input 
signals match various learned signal patterns. A com- 
parator determines which one of the values is greatest, 
thus indicating a best match between the input signal 
and one of the learned signal patterns. The training 60 
requirements for the neural net feature map are thereby 
greatly reduced. 

What is claimed is: 

1. A self neural net architecture for rate-varying input 
signals, comprising; 65 
means for sampling a rate-varying input signal, the 
input signal having an initial duration, the means 
for sampling having an* input and an output, the 



input of the means for sampling receiving the rate- 
varying input agnal. the means for sampling out- 
putting a sampled agnal pattern; 

means for time-scaling the sampled signal pattern, the 
means for time-scaling having an input and an out- 
put, the input of the means for time-scaling being 

, coupled to the output of the means for sampling, 
the means for time-scaling producing a scaled sig- 
nal pattern; 

a feattu'e map for comparing the scaled signal pattern 
to a stored signal pattern, the feature map having 
an input and an ou^ut, the input of the feature map 
being coupled to the output of the means for time- 
scaling; and 

means for determining a correct match between the 
scaled signal pattern and the stored signal pattern, 
the means for detertnining a correct match having 
an input and an output, the input of the means for 
determining a correct match being coupled to the 
output of the feature map, the output indicating the 
realization of a correct match as appropriate. 

2. The neural net architecture of claim 1, wherein the 
means for sampling comprises an analog-to-digital con- 
verter which generates digital samples of the rate-vary- 
ing input signal. 

3. The neural net architecture of claim 2, wherein the 
means for sampling fiuther comprises a fast Fourier 
transform circuit coupled to the analog-to-digital con- 
verter such that the fast Fourier transform circuit re- 
ceives the digital samples, the fast Fourier transform 
circuit Uansforming the digital samples from the time 
domain to the frequency domain. 

4. The neural net architecture of claim 1, wherein the 
means for time-scaling comprises: 

an interpolator, which expands the duration of the 
sampled signal pattern; 

a low-pass filter which smooths the sampled signal 
pattern received from the interpolator; and 

a decimator which compresses the time duration of 
the sampled signal pattern received from the low- 
pass filter, producing a scaled signal pattern whose 

. net scaling is a function of the ratio of the time 
expansion by the interpolator to the time compres- 
sion by the decimator. 

5. A neural net architecture vsdiich provides for the 
recognition of an input signal which is a rate variant of 
a learned signal pattern; comprising: 

an analog-to-digital converter which creates a digital 
sampling of the input signal, the digital sampling 
having an initial duration; 

a plurality of interpolators which expand the duration 
of the digital sampling by at least a first factor, each 
one of the plurality of interpolators outputting an 
expanded digital sampling; 

a plurality of low-pass filters, each one of the plural- 
ity of low-pass filters corresponding to one of the 
plurality of interpolators, each one of the plurality 
of low-pass filters serving to smooth the ei^anded 
digital sampling output by the corresponding one 
of the plurality of interpolators; 

a plurality of decimators, each one of the plurality of 
decimators corresponding to one of the plurality of 
low-pass filters, each one of the pltirality of 
decimators serving to compress the duration of the 
expanded digital sampling smoothed by the corre- 
sponding one of the plurality of low-pass filters by 
at least a second factor, resulting in a plurality of 
scaled digital samplings; 
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a feature map which contains a plurality of learned 
signal panems, to which the plurality of scaled 
digital samplings are compared, the feature map 
outputting a plurality of values representing how 5 
well each of the plurality of scaled digital sam- 
plings match each of the plurality of learned pat- 
terns; and 

a comparator for determining which one of the plu- 
rality of values is greatest, thus indicating a best 
match between the input signal and one of the 
plurality of learned signal patterns. 
6. The neural net architecture of claim 5 wherein a 
fast Fourier transform circuit transforms the digital 
sampling generated by the analog-to-digital converter 
from a time domain representation of the input signal to 
a frequency domain representation of the mput signal 
prior to. expansion by the plurality of interpolators. 
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7. A method for enabling a neural net feature map to 
recognize an input signal which is a rate variant of a 
learned signal pattern; -comprising: 
sampling the input signal with an analog-to-digital 
convener, the sampled signal having an initial du- 
ration; 

scaling the duration of the sampled signal by a scaling 
ratio by means of a method for scaling, comprising: 

expanding the time duration of the sampled signal by 
a first factor by means of an interpolator; 

filtering the expanded signal by means of a low-pass 
filter, 

compressing the time duration of the expanded signal 
by a second factor by means of a decimator, pro- 
viding a scaled signal, the scaling ratio being the 
ratio of the first factor to the second factor; 

comparing the scaled signal to learned signal patterns 
stored in the neural net feature map; and 

selecting the learned signal pattern from the neural 
net feature map which most closely matches the 
scaled signal. 
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